-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Optimize putmask implementation for CoW #51268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a EABackedBlock.setitem
as well, that probably needs to be updated to get the same signature as Block.setitem?
@@ -974,10 +974,19 @@ def setitem(self, indexer, value) -> Block: | |||
# checking lib.is_scalar here fails on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't comment on it, but a few lines above we have:
except LossySetitemError:
# current dtype cannot store value, coerce to common dtype
nb = self.coerce_to_target_dtype(value)
return nb.setitem(indexer, value)
Is it theoretically possible that the dtype cannot store the value, but that the astype operation can still be done with a view? And should we in that case pass through using_cow=using_cow
to the recursive setitem
call?
I can't directly think of a case (maybe an EA that is backed by object array but can't hold the object you are setting?), but maybe the safest option is to just pass it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I was thinking about it this as well, but I could not think of a case where we would get a LossySetitemError and then would not make a copy in the astype. Can add it if you prefer anyways
pandas/core/internals/blocks.py
Outdated
|
||
if using_cow and self.refs.has_reference(): | ||
values = values.copy() | ||
self = type(self)( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we have a mix of using type(self)(..)
and self.make_block_same_class(..)
. The latter avoids having to pass mgr_locs and ndim)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, switched.
if using_cow: | ||
return [self.copy(deep=False)] | ||
return [self] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also doesn't necessarily harm to always do return [self.copy(deep=False)]
to avoid the extra if branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, but it could also mess up something I am not thinking about right now. I‘d like to keep as is if you don’t feel strongly about it. Could our caching mechanism be a problem (only theoretically, because we should clear the cache when getting here I think), but there might be edge cases we are missing right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to leave as is for now. It will also be an easy clean-up once CoW is enforced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep exactly my thinking as well
putmask_without_repeat(values.T, mask, casted) | ||
if using_cow: | ||
return [self.copy(deep=False)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to shallow copy if we already did a hard copy above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, but the intermediate block will go out of scope, so making a shallow copy here will still only have a single ref in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, it does not hurt, that’s why I wanted to avoid making the check more complex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one typing error:
pandas/core/internals/blocks.py:1615: error: Incompatible types in assignment (expression has type "Block", variable has type "EABackedBlock") [assignment]
The failing test in the CoW build is the flaky |
CoW failure unrelated (flaky test) |
:) funny timing |
I was first! ;) |
Yep! was a tad to slow :( |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.